Overview

Dataset statistics

Number of variables 12
Number of observations 47478
Missing cells 8160
Missing cells (%) 1.4%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 4.3 MiB
Average record size in memory 96.0 B

Variable types

Text 1
DateTime 1
Categorical 6
Numeric 4

Alerts

victim_age is highly overall correlated with age_range High correlation
lat is highly overall correlated with city and 1 other fields High correlation
lon is highly overall correlated with city and 1 other fields High correlation
city is highly overall correlated with lat and 2 other fields High correlation
state is highly overall correlated with lat and 2 other fields High correlation
age_range is highly overall correlated with victim_age High correlation
victim_age has 4080 (8.6%) missing values Missing
age_range has 4080 (8.6%) missing values Missing
uid has unique values Unique

Reproduction

Analysis started 2023-09-11 18:44:31.255201
Analysis finished 2023-09-11 18:44:39.957800
Duration 8.7 seconds
Software version ydata-profiling vv4.3.2
Download configuration config.json

Variables

uid
Text

UNIQUE 

Distinct 47478
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
2023-09-11T13:44:40.277568 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Length

Max length 10
Median length 10
Mean length 9.9109482
Min length 9

Characters and Unicode

Total characters 470552
Distinct characters 47
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 47478 ?
Unique (%) 100.0%

Sample

1st row Alb-000001
2nd row Alb-000002
3rd row Alb-000003
4th row Alb-000004
5th row Alb-000005
Value Count Frequency (%)
alb-000001 1
 
< 0.1%
alb-000010 1
 
< 0.1%
alb-000025 1
 
< 0.1%
alb-000024 1
 
< 0.1%
alb-000003 1
 
< 0.1%
alb-000004 1
 
< 0.1%
alb-000005 1
 
< 0.1%
alb-000006 1
 
< 0.1%
alb-000007 1
 
< 0.1%
alb-000008 1
 
< 0.1%
Other values (47468) 47468
> 99.9%
2023-09-11T13:44:40.912586 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 137803
29.3%
- 47478
 
10.1%
1 23133
 
4.9%
2 18590
 
4.0%
3 18035
 
3.8%
4 16955
 
3.6%
7 16112
 
3.4%
5 14730
 
3.1%
6 14334
 
3.0%
i 12920
 
2.7%
Other values (37) 150462
32.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 284868
60.5%
Lowercase Letter 84839
 
18.0%
Uppercase Letter 53367
 
11.3%
Dash Punctuation 47478
 
10.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 12920
15.2%
a 11031
13.0%
h 9220
10.9%
o 8717
10.3%
l 7490
8.8%
t 6615
7.8%
s 6163
7.3%
u 4839
 
5.7%
e 4798
 
5.7%
n 3066
 
3.6%
Other values (8) 9980
11.8%
Uppercase Letter
Value Count Frequency (%)
C 7945
14.9%
L 6106
11.4%
B 5424
10.2%
S 4912
9.2%
P 3664
 
6.9%
D 3534
 
6.6%
M 3451
 
6.5%
O 3394
 
6.4%
H 2908
 
5.4%
N 2771
 
5.2%
Other values (8) 9258
17.3%
Decimal Number
Value Count Frequency (%)
0 137803
48.4%
1 23133
 
8.1%
2 18590
 
6.5%
3 18035
 
6.3%
4 16955
 
6.0%
7 16112
 
5.7%
5 14730
 
5.2%
6 14334
 
5.0%
8 12854
 
4.5%
9 12322
 
4.3%
Dash Punctuation
Value Count Frequency (%)
- 47478
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 332346
70.6%
Latin 138206
29.4%

Most frequent character per script

Latin
Value Count Frequency (%)
i 12920
 
9.3%
a 11031
 
8.0%
h 9220
 
6.7%
o 8717
 
6.3%
C 7945
 
5.7%
l 7490
 
5.4%
t 6615
 
4.8%
s 6163
 
4.5%
L 6106
 
4.4%
B 5424
 
3.9%
Other values (26) 56575
40.9%
Common
Value Count Frequency (%)
0 137803
41.5%
- 47478
 
14.3%
1 23133
 
7.0%
2 18590
 
5.6%
3 18035
 
5.4%
4 16955
 
5.1%
7 16112
 
4.8%
5 14730
 
4.4%
6 14334
 
4.3%
8 12854
 
3.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 470552
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 137803
29.3%
- 47478
 
10.1%
1 23133
 
4.9%
2 18590
 
4.0%
3 18035
 
3.8%
4 16955
 
3.6%
7 16112
 
3.4%
5 14730
 
3.1%
6 14334
 
3.0%
i 12920
 
2.7%
Other values (37) 150462
32.0%
Distinct 4018
Distinct (%) 8.5%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Minimum 2007-01-01 00:00:00
Maximum 2017-12-31 00:00:00
2023-09-11T13:44:41.196543 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:41.472121 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

victim_race
Categorical

Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Black
33062 
Hispanic
6817 
White
6259 
Asian
 
676
Other
 
664

Length

Max length 8
Median length 5
Mean length 5.4307469
Min length 5

Characters and Unicode

Total characters 257841
Distinct characters 17
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Hispanic
2nd row Hispanic
3rd row White
4th row Hispanic
5th row White

Common Values

Value Count Frequency (%)
Black 33062
69.6%
Hispanic 6817
 
14.4%
White 6259
 
13.2%
Asian 676
 
1.4%
Other 664
 
1.4%

Length

2023-09-11T13:44:41.735135 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-11T13:44:41.978076 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
black 33062
69.6%
hispanic 6817
 
14.4%
white 6259
 
13.2%
asian 676
 
1.4%
other 664
 
1.4%

Most occurring characters

Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 210363
81.6%
Uppercase Letter 47478
 
18.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 40555
19.3%
c 39879
19.0%
k 33062
15.7%
l 33062
15.7%
i 20569
9.8%
s 7493
 
3.6%
n 7493
 
3.6%
h 6923
 
3.3%
e 6923
 
3.3%
t 6923
 
3.3%
Other values (2) 7481
 
3.6%
Uppercase Letter
Value Count Frequency (%)
B 33062
69.6%
H 6817
 
14.4%
W 6259
 
13.2%
A 676
 
1.4%
O 664
 
1.4%

Most occurring scripts

Value Count Frequency (%)
Latin 257841
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

Most occurring blocks

Value Count Frequency (%)
ASCII 257841
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 40555
15.7%
c 39879
15.5%
B 33062
12.8%
k 33062
12.8%
l 33062
12.8%
i 20569
8.0%
s 7493
 
2.9%
n 7493
 
2.9%
h 6923
 
2.7%
e 6923
 
2.7%
Other values (7) 28820
11.2%

victim_age
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct 100
Distinct (%) 0.2%
Missing 4080
Missing (%) 8.6%
Infinite 0
Infinite (%) 0.0%
Mean 31.758261
Minimum 0
Maximum 101
Zeros 327
Zeros (%) 0.7%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-11T13:44:42.233104 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 16
Q1 22
median 28
Q3 39
95-th percentile 59
Maximum 101
Range 101
Interquartile range (IQR) 17

Descriptive statistics

Standard deviation 14.283064
Coefficient of variation (CV) 0.44974327
Kurtosis 1.2176166
Mean 31.758261
Median Absolute Deviation (MAD) 8
Skewness 0.91461752
Sum 1378245
Variance 204.00592
Monotonicity Not monotonic
2023-09-11T13:44:42.559527 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
22 1909
 
4.0%
21 1866
 
3.9%
23 1830
 
3.9%
24 1791
 
3.8%
19 1746
 
3.7%
25 1718
 
3.6%
20 1700
 
3.6%
26 1597
 
3.4%
18 1484
 
3.1%
29 1403
 
3.0%
Other values (90) 26354
55.5%
(Missing) 4080
 
8.6%
Value Count Frequency (%)
0 327
0.7%
1 304
0.6%
2 164
0.3%
3 99
 
0.2%
4 65
 
0.1%
5 43
 
0.1%
6 38
 
0.1%
7 41
 
0.1%
8 28
 
0.1%
9 20
 
< 0.1%
Value Count Frequency (%)
101 1
 
< 0.1%
99 1
 
< 0.1%
97 3
 
< 0.1%
96 2
 
< 0.1%
95 4
 
< 0.1%
94 5
< 0.1%
93 5
< 0.1%
92 5
< 0.1%
91 9
< 0.1%
90 11
< 0.1%

victim_sex
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Male
40387 
Female
7091 

Length

Max length 6
Median length 4
Mean length 4.2987068
Min length 4

Characters and Unicode

Total characters 204094
Distinct characters 6
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Male
2nd row Male
3rd row Female
4th row Male
5th row Female

Common Values

Value Count Frequency (%)
Male 40387
85.1%
Female 7091
 
14.9%

Length

2023-09-11T13:44:43.163461 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-11T13:44:43.407873 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
male 40387
85.1%
female 7091
 
14.9%

Most occurring characters

Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 156616
76.7%
Uppercase Letter 47478
 
23.3%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 54569
34.8%
a 47478
30.3%
l 47478
30.3%
m 7091
 
4.5%
Uppercase Letter
Value Count Frequency (%)
M 40387
85.1%
F 7091
 
14.9%

Most occurring scripts

Value Count Frequency (%)
Latin 204094
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 204094
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 54569
26.7%
a 47478
23.3%
l 47478
23.3%
M 40387
19.8%
F 7091
 
3.5%
m 7091
 
3.5%

city
Categorical

HIGH CORRELATION 

Distinct 47
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
Chicago
5523 
Philadelphia
 
3036
Houston
 
2908
Baltimore
 
2827
Detroit
 
2496
Other values (42)
30688 

Length

Max length 14
Median length 12
Mean length 8.9042715
Min length 5

Characters and Unicode

Total characters 422757
Distinct characters 44
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Albuquerque
2nd row Albuquerque
3rd row Albuquerque
4th row Albuquerque
5th row Albuquerque

Common Values

Value Count Frequency (%)
Chicago 5523
 
11.6%
Philadelphia 3036
 
6.4%
Houston 2908
 
6.1%
Baltimore 2827
 
6.0%
Detroit 2496
 
5.3%
Los Angeles 2196
 
4.6%
St. Louis 1661
 
3.5%
Memphis 1510
 
3.2%
New Orleans 1394
 
2.9%
Indianapolis 1321
 
2.8%
Other values (37) 22606
47.6%

Length

2023-09-11T13:44:43.608241 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
chicago 5523
 
9.4%
philadelphia 3036
 
5.2%
houston 2908
 
4.9%
baltimore 2827
 
4.8%
detroit 2496
 
4.2%
san 2212
 
3.8%
los 2196
 
3.7%
angeles 2196
 
3.7%
new 2016
 
3.4%
st 1661
 
2.8%
Other values (46) 31794
54.0%

Most occurring characters

Value Count Frequency (%)
a 41419
 
9.8%
o 37364
 
8.8%
i 37308
 
8.8%
e 28270
 
6.7%
n 27262
 
6.4%
l 25909
 
6.1%
s 23953
 
5.7%
t 23753
 
5.6%
h 20046
 
4.7%
r 13444
 
3.2%
Other values (34) 144029
34.1%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 350844
83.0%
Uppercase Letter 58865
 
13.9%
Space Separator 11387
 
2.7%
Other Punctuation 1661
 
0.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 41419
11.8%
o 37364
10.6%
i 37308
10.6%
e 28270
 
8.1%
n 27262
 
7.8%
l 25909
 
7.4%
s 23953
 
6.8%
t 23753
 
6.8%
h 20046
 
5.7%
r 13444
 
3.8%
Other values (13) 72116
20.6%
Uppercase Letter
Value Count Frequency (%)
C 8598
14.6%
L 6106
10.4%
B 5802
9.9%
S 4912
8.3%
A 4273
 
7.3%
P 3664
 
6.2%
D 3534
 
6.0%
M 3451
 
5.9%
O 3394
 
5.8%
H 2908
 
4.9%
Other values (9) 12223
20.8%
Space Separator
Value Count Frequency (%)
11387
100.0%
Other Punctuation
Value Count Frequency (%)
. 1661
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 409709
96.9%
Common 13048
 
3.1%

Most frequent character per script

Latin
Value Count Frequency (%)
a 41419
 
10.1%
o 37364
 
9.1%
i 37308
 
9.1%
e 28270
 
6.9%
n 27262
 
6.7%
l 25909
 
6.3%
s 23953
 
5.8%
t 23753
 
5.8%
h 20046
 
4.9%
r 13444
 
3.3%
Other values (32) 130981
32.0%
Common
Value Count Frequency (%)
11387
87.3%
. 1661
 
12.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 422757
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 41419
 
9.8%
o 37364
 
8.8%
i 37308
 
8.8%
e 28270
 
6.7%
n 27262
 
6.4%
l 25909
 
6.1%
s 23953
 
5.7%
t 23753
 
5.6%
h 20046
 
4.7%
r 13444
 
3.2%
Other values (34) 144029
34.1%

state
Categorical

HIGH CORRELATION 

Distinct 27
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
CA
6195 
IL
5523 
TX
4282 
PA
3664 
MD
2827 
Other values (22)
24987 

Length

Max length 2
Median length 2
Mean length 2
Min length 2

Characters and Unicode

Total characters 94956
Distinct characters 19
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row NM
2nd row NM
3rd row NM
4th row NM
5th row NM

Common Values

Value Count Frequency (%)
CA 6195
13.0%
IL 5523
 
11.6%
TX 4282
 
9.0%
PA 3664
 
7.7%
MD 2827
 
6.0%
MI 2496
 
5.3%
TN 2265
 
4.8%
LA 1817
 
3.8%
FL 1811
 
3.8%
OH 1761
 
3.7%
Other values (17) 14837
31.3%

Length

2023-09-11T13:44:43.818613 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
ca 6195
13.0%
il 5523
 
11.6%
tx 4282
 
9.0%
pa 3664
 
7.7%
md 2827
 
6.0%
mi 2496
 
5.3%
tn 2265
 
4.8%
la 1817
 
3.8%
fl 1811
 
3.8%
oh 1761
 
3.7%
Other values (17) 14837
31.3%

Most occurring characters

Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 93841
98.8%
Lowercase Letter 1115
 
1.2%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
A 14581
15.5%
I 10455
11.1%
L 9937
10.6%
C 8752
9.3%
M 8237
8.8%
N 8004
8.5%
T 6547
7.0%
O 4959
 
5.3%
X 4282
 
4.6%
D 4135
 
4.4%
Other values (8) 13952
14.9%
Lowercase Letter
Value Count Frequency (%)
w 1115
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 94956
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 94956
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
A 14581
15.4%
I 10455
11.0%
L 9937
10.5%
C 8752
9.2%
M 8237
8.7%
N 8004
8.4%
T 6547
6.9%
O 4959
 
5.2%
X 4282
 
4.5%
D 4135
 
4.4%
Other values (9) 15067
15.9%

lat
Real number (ℝ)

HIGH CORRELATION 

Distinct 40953
Distinct (%) 86.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 37.269773
Minimum 25.725214
Maximum 45.05119
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-11T13:44:44.067044 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 25.725214
5-th percentile 29.685616
Q1 34.0278
median 38.661111
Q3 40.459645
95-th percentile 42.431242
Maximum 45.05119
Range 19.325976
Interquartile range (IQR) 6.4318452

Descriptive statistics

Standard deviation 4.3377485
Coefficient of variation (CV) 0.11638784
Kurtosis -0.68894128
Mean 37.269773
Median Absolute Deviation (MAD) 3.1492084
Skewness -0.5834027
Sum 1769494.3
Variance 18.816062
Monotonicity Not monotonic
2023-09-11T13:44:44.344623 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
30.302565 15
 
< 0.1%
37.7503065 14
 
< 0.1%
34.075761 14
 
< 0.1%
38.8741662 12
 
< 0.1%
34.1016 12
 
< 0.1%
37.506284 11
 
< 0.1%
33.9456 11
 
< 0.1%
34.2085 11
 
< 0.1%
41.8645202 10
 
< 0.1%
41.794929 10
 
< 0.1%
Other values (40943) 47358
99.7%
Value Count Frequency (%)
25.7252139 1
< 0.1%
25.7262775 1
< 0.1%
25.7273453 1
< 0.1%
25.7280792 1
< 0.1%
25.7305986 1
< 0.1%
25.7310598 1
< 0.1%
25.7328853 1
< 0.1%
25.7398679 2
< 0.1%
25.7400967 1
< 0.1%
25.7463239 1
< 0.1%
Value Count Frequency (%)
45.05119 1
< 0.1%
45.05052 1
< 0.1%
45.04835 1
< 0.1%
45.04752 1
< 0.1%
45.04471 1
< 0.1%
45.04333 1
< 0.1%
45.04293 1
< 0.1%
45.04223 1
< 0.1%
45.04155 2
< 0.1%
45.03765 2
< 0.1%

lon
Real number (ℝ)

HIGH CORRELATION 

Distinct 40640
Distinct (%) 85.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean -90.846625
Minimum -122.50778
Maximum -71.011519
Zeros 0
Zeros (%) 0.0%
Negative 47478
Negative (%) 100.0%
Memory size 371.0 KiB
2023-09-11T13:44:44.638567 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum -122.50778
5-th percentile -121.26209
Q1 -95.501941
median -87.653205
Q3 -81.649653
95-th percentile -75.149038
Maximum -71.011519
Range 51.49626
Interquartile range (IQR) 13.852288

Descriptive statistics

Standard deviation 13.903806
Coefficient of variation (CV) -0.15304703
Kurtosis 0.089284055
Mean -90.846625
Median Absolute Deviation (MAD) 7.6327799
Skewness -1.0492842
Sum -4313216.1
Variance 193.31582
Monotonicity Not monotonic
2023-09-11T13:44:44.927556 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
-118.2739 62
 
0.1%
-118.2827 31
 
0.1%
-118.309 27
 
0.1%
-118.2783 25
 
0.1%
-118.3089 21
 
< 0.1%
-118.2695 20
 
< 0.1%
-118.2893 16
 
< 0.1%
-118.2651 16
 
< 0.1%
-81.732164 15
 
< 0.1%
-118.2871 15
 
< 0.1%
Other values (40630) 47230
99.5%
Value Count Frequency (%)
-122.507779 1
< 0.1%
-122.5046043 1
< 0.1%
-122.5032863 1
< 0.1%
-122.4920669 1
< 0.1%
-122.4884236 1
< 0.1%
-122.487628 1
< 0.1%
-122.4854123 1
< 0.1%
-122.484382 1
< 0.1%
-122.4842604 1
< 0.1%
-122.4826038 1
< 0.1%
Value Count Frequency (%)
-71.0115188 1
< 0.1%
-71.0123576 1
< 0.1%
-71.0192427 1
< 0.1%
-71.0306083 1
< 0.1%
-71.0312049 1
< 0.1%
-71.0319745 1
< 0.1%
-71.0325517 1
< 0.1%
-71.0329989 1
< 0.1%
-71.034274 1
< 0.1%
-71.034615 1
< 0.1%

disposition
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 371.0 KiB
No Arrest
24258 
Arrest Made
23220 

Length

Max length 11
Median length 9
Mean length 9.9781372
Min length 9

Characters and Unicode

Total characters 473742
Distinct characters 11
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row No Arrest
2nd row Arrest Made
3rd row No Arrest
4th row Arrest Made
5th row No Arrest

Common Values

Value Count Frequency (%)
No Arrest 24258
51.1%
Arrest Made 23220
48.9%

Length

2023-09-11T13:44:45.208568 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-11T13:44:45.451469 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
arrest 47478
50.0%
no 24258
25.5%
made 23220
24.5%

Most occurring characters

Value Count Frequency (%)
r 94956
20.0%
e 70698
14.9%
47478
10.0%
A 47478
10.0%
s 47478
10.0%
t 47478
10.0%
N 24258
 
5.1%
o 24258
 
5.1%
M 23220
 
4.9%
a 23220
 
4.9%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 331308
69.9%
Uppercase Letter 94956
 
20.0%
Space Separator 47478
 
10.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
r 94956
28.7%
e 70698
21.3%
s 47478
14.3%
t 47478
14.3%
o 24258
 
7.3%
a 23220
 
7.0%
d 23220
 
7.0%
Uppercase Letter
Value Count Frequency (%)
A 47478
50.0%
N 24258
25.5%
M 23220
24.5%
Space Separator
Value Count Frequency (%)
47478
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 426264
90.0%
Common 47478
 
10.0%

Most frequent character per script

Latin
Value Count Frequency (%)
r 94956
22.3%
e 70698
16.6%
A 47478
11.1%
s 47478
11.1%
t 47478
11.1%
N 24258
 
5.7%
o 24258
 
5.7%
M 23220
 
5.4%
a 23220
 
5.4%
d 23220
 
5.4%
Common
Value Count Frequency (%)
47478
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 473742
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
r 94956
20.0%
e 70698
14.9%
47478
10.0%
A 47478
10.0%
s 47478
10.0%
t 47478
10.0%
N 24258
 
5.1%
o 24258
 
5.1%
M 23220
 
4.9%
a 23220
 
4.9%

reported_year
Real number (ℝ)

Distinct 11
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 2012.3466
Minimum 2007
Maximum 2017
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 371.0 KiB
2023-09-11T13:44:45.617206 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum 2007
5-th percentile 2007
Q1 2010
median 2012
Q3 2015
95-th percentile 2017
Maximum 2017
Range 10
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 3.153443
Coefficient of variation (CV) 0.0015670476
Kurtosis -1.2016741
Mean 2012.3466
Median Absolute Deviation (MAD) 3
Skewness -0.15546959
Sum 95542194
Variance 9.9442029
Monotonicity Not monotonic
2023-09-11T13:44:45.809515 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
Value Count Frequency (%)
2016 5794
12.2%
2015 4927
10.4%
2017 4517
9.5%
2012 4483
9.4%
2014 4248
8.9%
2010 4238
8.9%
2013 4199
8.8%
2011 4140
8.7%
2007 3807
8.0%
2008 3753
7.9%
Value Count Frequency (%)
2007 3807
8.0%
2008 3753
7.9%
2009 3372
7.1%
2010 4238
8.9%
2011 4140
8.7%
2012 4483
9.4%
2013 4199
8.8%
2014 4248
8.9%
2015 4927
10.4%
2016 5794
12.2%
Value Count Frequency (%)
2017 4517
9.5%
2016 5794
12.2%
2015 4927
10.4%
2014 4248
8.9%
2013 4199
8.8%
2012 4483
9.4%
2011 4140
8.7%
2010 4238
8.9%
2009 3372
7.1%
2008 3753
7.9%

age_range
Categorical

HIGH CORRELATION  MISSING 

Distinct 5
Distinct (%) < 0.1%
Missing 4080
Missing (%) 8.6%
Memory size 371.0 KiB
18-29
19818 
30-44
12360 
45-64
6459 
0-17
3397 
65+
 
1364

Length

Max length 5
Median length 5
Mean length 4.8588645
Min length 3

Characters and Unicode

Total characters 210865
Distinct characters 12
Distinct categories 3 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 65+
2nd row 0-17
3rd row 0-17
4th row 30-44
5th row 65+

Common Values

Value Count Frequency (%)
18-29 19818
41.7%
30-44 12360
26.0%
45-64 6459
 
13.6%
0-17 3397
 
7.2%
65+ 1364
 
2.9%
(Missing) 4080
 
8.6%

Length

2023-09-11T13:44:46.046879 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-11T13:44:46.328876 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Value Count Frequency (%)
18-29 19818
45.7%
30-44 12360
28.5%
45-64 6459
 
14.9%
0-17 3397
 
7.8%
65 1364
 
3.1%

Most occurring characters

Value Count Frequency (%)
- 42034
19.9%
4 37638
17.8%
1 23215
11.0%
8 19818
9.4%
2 19818
9.4%
9 19818
9.4%
0 15757
 
7.5%
3 12360
 
5.9%
5 7823
 
3.7%
6 7823
 
3.7%
Other values (2) 4761
 
2.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 167467
79.4%
Dash Punctuation 42034
 
19.9%
Math Symbol 1364
 
0.6%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
4 37638
22.5%
1 23215
13.9%
8 19818
11.8%
2 19818
11.8%
9 19818
11.8%
0 15757
9.4%
3 12360
 
7.4%
5 7823
 
4.7%
6 7823
 
4.7%
7 3397
 
2.0%
Dash Punctuation
Value Count Frequency (%)
- 42034
100.0%
Math Symbol
Value Count Frequency (%)
+ 1364
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 210865
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
- 42034
19.9%
4 37638
17.8%
1 23215
11.0%
8 19818
9.4%
2 19818
9.4%
9 19818
9.4%
0 15757
 
7.5%
3 12360
 
5.9%
5 7823
 
3.7%
6 7823
 
3.7%
Other values (2) 4761
 
2.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 210865
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
- 42034
19.9%
4 37638
17.8%
1 23215
11.0%
8 19818
9.4%
2 19818
9.4%
9 19818
9.4%
0 15757
 
7.5%
3 12360
 
5.9%
5 7823
 
3.7%
6 7823
 
3.7%
Other values (2) 4761
 
2.3%

Interactions

2023-09-11T13:44:37.632176 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:34.831060 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:35.810609 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:36.740864 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:37.854125 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:35.058571 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:36.064588 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:36.957487 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:38.124067 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:35.341922 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:36.318887 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:37.224214 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:38.354878 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:35.571673 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:36.553913 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
2023-09-11T13:44:37.475592 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Correlations

2023-09-11T13:44:46.538194 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
victim_age lat lon reported_year victim_race victim_sex city state disposition age_range
victim_age 1.000 -0.083 -0.046 0.043 0.129 0.188 0.067 0.063 0.128 0.831
lat -0.083 1.000 0.420 0.006 0.165 0.073 0.979 0.748 0.166 0.055
lon -0.046 0.420 1.000 -0.011 0.236 0.082 1.000 0.882 0.094 0.060
reported_year 0.043 0.006 -0.011 1.000 0.021 0.014 0.136 0.108 0.104 0.022
victim_race 0.129 0.165 0.236 0.021 1.000 0.163 0.305 0.268 0.112 0.123
victim_sex 0.188 0.073 0.082 0.014 0.163 1.000 0.111 0.105 0.102 0.146
city 0.067 0.979 1.000 0.136 0.305 0.111 1.000 1.000 0.240 0.087
state 0.063 0.748 0.882 0.108 0.268 0.105 1.000 1.000 0.224 0.080
disposition 0.128 0.166 0.094 0.104 0.112 0.102 0.240 0.224 1.000 0.105
age_range 0.831 0.055 0.060 0.022 0.123 0.146 0.087 0.080 0.105 1.000

Missing values

2023-09-11T13:44:38.753554 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-11T13:44:39.326520 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-09-11T13:44:39.787435 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

uid reported_date victim_race victim_age victim_sex city state lat lon disposition reported_year age_range
0 Alb-000001 2010-05-04 Hispanic 78.0 Male Albuquerque NM 35.095788 -106.538555 No Arrest 2010 65+
1 Alb-000002 2010-02-16 Hispanic 17.0 Male Albuquerque NM 35.056810 -106.715321 Arrest Made 2010 0-17
2 Alb-000003 2010-06-01 White 15.0 Female Albuquerque NM 35.086092 -106.695568 No Arrest 2010 0-17
3 Alb-000004 2010-01-01 Hispanic 32.0 Male Albuquerque NM 35.078493 -106.556094 Arrest Made 2010 30-44
4 Alb-000005 2010-01-02 White 72.0 Female Albuquerque NM 35.130357 -106.580986 No Arrest 2010 65+
5 Alb-000006 2010-01-26 White 91.0 Female Albuquerque NM 35.151110 -106.537797 No Arrest 2010 65+
6 Alb-000007 2010-01-27 Hispanic 52.0 Male Albuquerque NM 35.111785 -106.712614 Arrest Made 2010 45-64
7 Alb-000008 2010-01-27 Hispanic 52.0 Female Albuquerque NM 35.111785 -106.712614 Arrest Made 2010 45-64
8 Alb-000009 2010-01-30 White 56.0 Male Albuquerque NM 35.075380 -106.553458 No Arrest 2010 45-64
9 Alb-000010 2010-02-10 Hispanic 43.0 Male Albuquerque NM 35.065930 -106.572288 No Arrest 2010 30-44
uid reported_date victim_race victim_age victim_sex city state lat lon disposition reported_year age_range
47468 Was-001375 2016-02-24 Black 22.0 Male Washington DC 38.843399 -77.000104 Arrest Made 2016 18-29
47469 Was-001376 2016-07-31 Black 25.0 Male Washington DC 38.863322 -76.995309 No Arrest 2016 18-29
47470 Was-001377 2016-09-16 Black 35.0 Male Washington DC 38.845871 -76.998169 Arrest Made 2016 30-44
47471 Was-001378 2016-04-15 Black 37.0 Male Washington DC 38.826458 -77.003590 Arrest Made 2016 30-44
47472 Was-001379 2016-07-15 Black 20.0 Male Washington DC 38.827266 -77.001572 No Arrest 2016 18-29
47473 Was-001380 2016-09-08 Black 29.0 Male Washington DC 38.828704 -77.002075 Arrest Made 2016 18-29
47474 Was-001381 2016-09-13 Black 19.0 Male Washington DC 38.822852 -77.001725 No Arrest 2016 18-29
47475 Was-001382 2016-11-14 Black 23.0 Male Washington DC 38.828025 -77.002511 No Arrest 2016 18-29
47476 Was-001383 2016-11-30 Black 24.0 Male Washington DC 38.820476 -77.008640 No Arrest 2016 18-29
47477 Was-001384 2016-09-01 Black 17.0 Male Washington DC 38.866689 -76.982409 Arrest Made 2016 0-17